Omg finally ready to analyse data! There’s still some ambiguities in the tissue assignments, but I think that’s as good as it’s going to get.
So first we want to check how many studies ended up mixed between race and geography and do some cleaning:
tabulatePops <- by(allSRAFinal, allSRAFinal$SRA.Study, function(x) table(x$finalGeography, x$finalRace))
conflictStudies <- names(tabulatePops[lapply(tabulatePops, function(x) grep("NULL", dimnames(x))) %>% grepl(0, .)])
length(conflictStudies) # That's a lot of messiness...
## [1] 36
length(unique(allSRAFinal$SRA.Study))
## [1] 263
by(allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies,], allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies,]$SRA.Study, function(x) table(x$finalGeography, x$finalRace))
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: DRP001797
##
## Black or African American White
## East Asia 0 0
## North Africa and Western Asia 0 0
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: ERP116722
##
## Other
## Subsaharan Africa 0
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: ERP117085
##
## Black or African American
## Multiple 0
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP063355
##
## Black or African American White
## Asia 0 0
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP070663
##
## Black or African American
## Asia 0
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP072417
##
## Asian Black or African American White
## South Asia 0 0 0
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP073813
##
## Black or African American Native Hawaiian or other Pacific Islander Other White
## Asia 0 0 0 0
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP074739
##
## Black or African American White
## South Asia 0 0
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP102952
##
## White
## East Asia 0
## South Asia 0
## Southeast Asia 0
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP114762
##
## Black or African American
## Europe 0
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP116913
##
## American Indian or Alaskan Native Black or African American Other
## Europe 0 0 0
## South Asia 0 0 0
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP118614
##
## Black or African American
## Europe 0
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP125882
##
## Black or African American White
## Americas 0 0
## Asia 0 0
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP155483
##
## White
## East Asia 0
## South Asia 0
## Southeast Asia 0
## Subsaharan Africa 0
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP172694
##
## Multiple White
## Subsaharan Africa 0 0
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP188296
##
## American Indian or Alaskan Native Asian Black or African American White
## North Africa and Western Asia 0 0 0 0
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP190479
##
## Native Hawaiian or other Pacific Islander White
## Asia 0 0
## North Africa and Western Asia 0 0
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP212343
##
## Native Hawaiian or other Pacific Islander White
## Asia 0 0
## North Africa and Western Asia 0 0
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP212369
##
## Native Hawaiian or other Pacific Islander White
## Asia 0 0
## North Africa and Western Asia 0 0
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP212370
##
## Native Hawaiian or other Pacific Islander White
## Asia 0 0
## North Africa and Western Asia 0 0
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP216558
##
## White
## Asia 0
## South Asia 0
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP219483
##
## Black or African American White
## South Asia 0 0
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP221484
##
## Black or African American Multiple
## Asia 0 0
## Europe 0 0
## Multiple 0 0
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP226691
##
## White
## East Asia 0
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP245400
##
## American Indian or Alaskan Native Black or African American White
## Asia 0 0 0
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP251118
##
## Black or African American
## Europe 0
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP268711
##
## Black or African American White
## South Asia 0 0
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP274641
##
## Multiple White
## Subsaharan Africa 0 0
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP283115
##
## Black or African American Multiple White
## Asia 0 0 0
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP300191
##
## Asian Black or African American White
## Subsaharan Africa 0 0 0
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP303641
##
## White
## Asia 0
## North Africa and Western Asia 0
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP303646
##
## Asian Black or African American White
## South Asia 0 0 0
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP324614
##
## Black or African American
## Europe 0
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP363798
##
## White
## Americas 0
## Asia 0
## East Asia 0
## South Asia 0
## Subsaharan Africa 0
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP377781
##
## White
## Asia 0
## Multiple 0
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP388678
##
## Black or African American Other White
## Asia 0 0 0
## Europe 0 0 0
## Multiple 0 0 0
## North Africa and Western Asia 0 0 0
tabulateTerms <- by(allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies,], allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies,]$SRA.Study, function(x) table(is.na(x$finalGeography), is.na(x$finalRace))) # So there's often a clear skew towards one category, which should make solving these easier...
tabulateTerms <- data.frame(melt(unlist(tabulateTerms)))
tabulateTerms$condition <- rep(c("both", "race", "geography", "neither"), nrow(tabulateTerms)/4)
tabulateTerms$SRA.Study <- str_sub(rownames(tabulateTerms), end=-2)
ggplot(tabulateTerms, aes(x = SRA.Study, y = value, fill=condition)) +
geom_bar(stat="identity") +
theme_bw() +
ggtitle("Race or Ethnicity usage") +
# xlab("finalSystem") +
ylab("Count") +
scale_fill_brewer(palette = "Set1") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
theme(legend.position="bottom")
From this it’s very clear what we should do for most studies! The vast majority of them should use only racial terms and (I’m guessing), swap Asian over to a racial descriptor. But anyhow, let’s manually spot check some of these:
columnsILike <- c(1,10, 15, 34:36) # Just need those two intermediate ones tbh
by(allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies,], allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies,]$SRA.Study, function(x) head(x[,columnsILike]))
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: DRP001797
## SRA.Study ETHNICITY RACE finalRace finalGeography hispanic
## 98154 DRP001797 Arabic <NA> <NA> North Africa and Western Asia <NA>
## 98176 DRP001797 Arabic <NA> <NA> North Africa and Western Asia <NA>
## 98300 DRP001797 Arabic <NA> <NA> North Africa and Western Asia <NA>
## 98394 DRP001797 Japanese <NA> <NA> East Asia <NA>
## 98440 DRP001797 Japanese <NA> <NA> East Asia <NA>
## 98443 DRP001797 Japanese <NA> <NA> East Asia <NA>
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: ERP116722
## SRA.Study ETHNICITY RACE finalRace finalGeography hispanic
## 188456 ERP116722 Mandingo <NA> <NA> Subsaharan Africa <NA>
## 188457 ERP116722 Jola <NA> <NA> Subsaharan Africa <NA>
## 188459 ERP116722 Mandingo <NA> <NA> Subsaharan Africa <NA>
## 188460 ERP116722 Fulla <NA> <NA> Subsaharan Africa <NA>
## 188461 ERP116722 Fulla <NA> <NA> Subsaharan Africa <NA>
## 188463 ERP116722 Jola <NA> <NA> Subsaharan Africa <NA>
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: ERP117085
## SRA.Study ETHNICITY RACE finalRace finalGeography hispanic
## 603830 ERP117085 mixed ancestry <NA> <NA> Multiple <NA>
## 603831 ERP117085 mixed ancestry <NA> <NA> Multiple <NA>
## 603832 ERP117085 mixed ancestry <NA> <NA> Multiple <NA>
## 603834 ERP117085 mixed ancestry <NA> <NA> Multiple <NA>
## 603835 ERP117085 mixed ancestry <NA> <NA> Multiple <NA>
## 603836 ERP117085 mixed ancestry <NA> <NA> Multiple <NA>
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP063355
## SRA.Study ETHNICITY RACE finalRace finalGeography hispanic
## 87235 SRP063355 asian <NA> <NA> Asia <NA>
## 87242 SRP063355 asian <NA> <NA> Asia <NA>
## 87226 SRP063355 black <NA> Black or African American <NA> <NA>
## 87227 SRP063355 black <NA> Black or African American <NA> <NA>
## 87228 SRP063355 white <NA> White <NA> <NA>
## 87229 SRP063355 white <NA> White <NA> <NA>
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP070663
## SRA.Study ETHNICITY RACE finalRace finalGeography hispanic
## 293582 SRP070663 Asian <NA> <NA> Asia <NA>
## 293583 SRP070663 Asian <NA> <NA> Asia <NA>
## 293584 SRP070663 Asian <NA> <NA> Asia <NA>
## 293585 SRP070663 Asian <NA> <NA> Asia <NA>
## 293586 SRP070663 Asian <NA> <NA> Asia <NA>
## 293587 SRP070663 Asian <NA> <NA> Asia <NA>
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP072417
## SRA.Study ETHNICITY RACE finalRace finalGeography hispanic
## 282324 SRP072417 <NA> White.Hispanic White <NA> hispanic
## 282325 SRP072417 <NA> White.Hispanic White <NA> hispanic
## 282326 SRP072417 <NA> White White <NA> <NA>
## 282327 SRP072417 <NA> White White <NA> <NA>
## 282328 SRP072417 <NA> White White <NA> <NA>
## 282329 SRP072417 <NA> White White <NA> <NA>
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP073813
## SRA.Study ETHNICITY RACE finalRace finalGeography hispanic
## 139194 SRP073813 Asian <NA> <NA> Asia <NA>
## 139219 SRP073813 Asian <NA> <NA> Asia <NA>
## 139247 SRP073813 Asian <NA> <NA> Asia <NA>
## 139420 SRP073813 Asian <NA> <NA> Asia <NA>
## 139433 SRP073813 Asian <NA> <NA> Asia <NA>
## 139527 SRP073813 Asian <NA> <NA> Asia <NA>
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP074739
## SRA.Study ETHNICITY RACE finalRace finalGeography hispanic
## 189038 SRP074739 South Indian <NA> <NA> South Asia <NA>
## 189040 SRP074739 South Indian <NA> <NA> South Asia <NA>
## 189041 SRP074739 South Indian <NA> <NA> South Asia <NA>
## 189042 SRP074739 South Indian <NA> <NA> South Asia <NA>
## 189043 SRP074739 South Indian <NA> <NA> South Asia <NA>
## 189014 SRP074739 Caucasian <NA> White <NA> <NA>
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP102952
## SRA.Study ETHNICITY RACE finalRace finalGeography hispanic
## 119193 SRP102952 <NA> Caucasian White <NA> <NA>
## 119195 SRP102952 <NA> Caucasian White <NA> <NA>
## 119013 SRP102952 <NA> Indian <NA> South Asia <NA>
## 119015 SRP102952 <NA> Indian <NA> South Asia <NA>
## 119017 SRP102952 <NA> Chinese <NA> East Asia <NA>
## 119019 SRP102952 <NA> Chinese <NA> East Asia <NA>
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP114762
## SRA.Study ETHNICITY RACE finalRace finalGeography hispanic
## 184554 SRP114762 White <NA> <NA> Europe <NA>
## 184555 SRP114762 White <NA> <NA> Europe <NA>
## 184556 SRP114762 White <NA> <NA> Europe <NA>
## 184557 SRP114762 White <NA> <NA> Europe <NA>
## 184558 SRP114762 White <NA> <NA> Europe <NA>
## 184559 SRP114762 White <NA> <NA> Europe <NA>
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP116913
## SRA.Study ETHNICITY RACE finalRace finalGeography hispanic
## 589614 SRP116913 White - caucasian / european heritage <NA> <NA> Europe <NA>
## 617071 SRP116913 White - caucasian / european heritage <NA> <NA> Europe <NA>
## 617073 SRP116913 White - caucasian / european heritage <NA> <NA> Europe <NA>
## 617074 SRP116913 White - caucasian / european heritage <NA> <NA> Europe <NA>
## 617076 SRP116913 White - caucasian / european heritage <NA> <NA> Europe <NA>
## 617077 SRP116913 White - caucasian / european heritage <NA> <NA> Europe <NA>
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP118614
## SRA.Study ETHNICITY RACE finalRace finalGeography hispanic
## 248113 SRP118614 European Americans <NA> <NA> Europe <NA>
## 248115 SRP118614 European Americans <NA> <NA> Europe <NA>
## 248117 SRP118614 European Americans <NA> <NA> Europe <NA>
## 248119 SRP118614 European Americans <NA> <NA> Europe <NA>
## 248121 SRP118614 European Americans <NA> <NA> Europe <NA>
## 248123 SRP118614 European Americans <NA> <NA> Europe <NA>
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP125882
## SRA.Study ETHNICITY RACE finalRace finalGeography hispanic
## 225638 SRP125882 Oriental <NA> <NA> Asia <NA>
## 225640 SRP125882 Latin <NA> <NA> Americas <NA>
## 225613 SRP125882 Caucasian <NA> White <NA> <NA>
## 225614 SRP125882 Black <NA> Black or African American <NA> <NA>
## 225615 SRP125882 Black <NA> Black or African American <NA> <NA>
## 225616 SRP125882 Caucasian <NA> White <NA> <NA>
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP155483
## SRA.Study ETHNICITY RACE finalRace finalGeography hispanic
## 211874 SRP155483 <NA> <NA> <NA> East Asia <NA>
## 211875 SRP155483 <NA> <NA> <NA> South Asia <NA>
## 211879 SRP155483 <NA> <NA> <NA> East Asia <NA>
## 211882 SRP155483 <NA> <NA> <NA> Southeast Asia <NA>
## 211934 SRP155483 <NA> <NA> <NA> South Asia <NA>
## 211954 SRP155483 <NA> <NA> <NA> Subsaharan Africa <NA>
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP172694
## SRA.Study ETHNICITY RACE finalRace finalGeography hispanic
## 677192 SRP172694 African <NA> <NA> Subsaharan Africa <NA>
## 677196 SRP172694 African <NA> <NA> Subsaharan Africa <NA>
## 677186 SRP172694 Asian-Pacificlslander <NA> Multiple <NA> <NA>
## 677188 SRP172694 Caucasian <NA> White <NA> <NA>
## 677204 SRP172694 Asian-Pacificlslander <NA> Multiple <NA> <NA>
## 677208 SRP172694 Hispanic <NA> <NA> <NA> hispanic
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP188296
## SRA.Study ETHNICITY RACE finalRace finalGeography hispanic
## 375085 SRP188296 <NA> Caucasian White <NA> <NA>
## 375086 SRP188296 <NA> Caucasian White <NA> <NA>
## 375087 SRP188296 <NA> Native American American Indian or Alaskan Native <NA> <NA>
## 375088 SRP188296 <NA> Asian Asian <NA> <NA>
## 375089 SRP188296 <NA> Caucasian White <NA> <NA>
## 375090 SRP188296 <NA> African/American Black or African American <NA> <NA>
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP190479
## SRA.Study ETHNICITY RACE finalRace finalGeography hispanic
## 596096 SRP190479 Asian <NA> <NA> Asia <NA>
## 596130 SRP190479 Middle Eastern <NA> <NA> North Africa and Western Asia <NA>
## 596077 SRP190479 Caucasian <NA> White <NA> <NA>
## 596078 SRP190479 Caucasian <NA> White <NA> <NA>
## 596079 SRP190479 Caucasian <NA> White <NA> <NA>
## 596080 SRP190479 Caucasian <NA> White <NA> <NA>
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP212343
## SRA.Study ETHNICITY RACE finalRace finalGeography hispanic
## 596482 SRP212343 Asian <NA> <NA> Asia <NA>
## 596558 SRP212343 Middle Eastern <NA> <NA> North Africa and Western Asia <NA>
## 596478 SRP212343 Caucasian <NA> White <NA> <NA>
## 596479 SRP212343 Caucasian <NA> White <NA> <NA>
## 596480 SRP212343 Caucasian <NA> White <NA> <NA>
## 596481 SRP212343 Caucasian <NA> White <NA> <NA>
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP212369
## SRA.Study ETHNICITY RACE finalRace finalGeography hispanic
## 596572 SRP212369 Middle Eastern <NA> <NA> North Africa and Western Asia <NA>
## 596590 SRP212369 Asian <NA> <NA> Asia <NA>
## 596565 SRP212369 Caucasian <NA> White <NA> <NA>
## 596566 SRP212369 Caucasian <NA> White <NA> <NA>
## 596567 SRP212369 Caucasian <NA> White <NA> <NA>
## 596568 SRP212369 Caucasian <NA> White <NA> <NA>
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP212370
## SRA.Study ETHNICITY RACE finalRace finalGeography hispanic
## 596639 SRP212370 Middle Eastern <NA> <NA> North Africa and Western Asia <NA>
## 596676 SRP212370 Asian <NA> <NA> Asia <NA>
## 596625 SRP212370 Caucasian <NA> White <NA> <NA>
## 596626 SRP212370 Caucasian <NA> White <NA> <NA>
## 596627 SRP212370 Caucasian <NA> White <NA> <NA>
## 596628 SRP212370 Caucasian <NA> White <NA> <NA>
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP216558
## SRA.Study ETHNICITY RACE finalRace finalGeography hispanic
## 685098 SRP216558 Asian <NA> <NA> Asia <NA>
## 685100 SRP216558 Indian <NA> <NA> South Asia <NA>
## 685101 SRP216558 Indian <NA> <NA> South Asia <NA>
## 685102 SRP216558 Indian <NA> <NA> South Asia <NA>
## 685103 SRP216558 Indian <NA> <NA> South Asia <NA>
## 685104 SRP216558 Indian <NA> <NA> South Asia <NA>
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP219483
## SRA.Study ETHNICITY RACE finalRace finalGeography hispanic
## 660041 SRP219483 <NA> black Black or African American <NA> <NA>
## 660042 SRP219483 <NA> caucasian White <NA> <NA>
## 660043 SRP219483 <NA> caucasian White <NA> <NA>
## 660046 SRP219483 <NA> caucasian White <NA> <NA>
## 660047 SRP219483 <NA> caucasian White <NA> <NA>
## 660048 SRP219483 <NA> caucasian White <NA> <NA>
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP221484
## SRA.Study ETHNICITY RACE finalRace finalGeography hispanic
## 612274 SRP221484 <NA> <NA> <NA> Europe <NA>
## 612277 SRP221484 <NA> <NA> <NA> Europe <NA>
## 612279 SRP221484 <NA> <NA> <NA> Europe <NA>
## 612286 SRP221484 <NA> <NA> <NA> Europe <NA>
## 612287 SRP221484 <NA> <NA> <NA> Europe <NA>
## 612288 SRP221484 <NA> <NA> <NA> Europe <NA>
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP226691
## SRA.Study ETHNICITY RACE finalRace finalGeography hispanic
## 562018 SRP226691 <NA> Caucasian White <NA> <NA>
## 562022 SRP226691 <NA> Japanese <NA> East Asia <NA>
## 562026 SRP226691 <NA> Japanese <NA> East Asia <NA>
## 562030 SRP226691 <NA> Japanese <NA> East Asia <NA>
## 562034 SRP226691 <NA> Japanese <NA> East Asia <NA>
## 562038 SRP226691 <NA> Japanese <NA> East Asia <NA>
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP245400
## SRA.Study ETHNICITY RACE finalRace finalGeography hispanic
## 556745 SRP245400 Asian <NA> <NA> Asia <NA>
## 556803 SRP245400 Asian <NA> <NA> Asia <NA>
## 556865 SRP245400 Asian <NA> <NA> Asia <NA>
## 556870 SRP245400 Asian <NA> <NA> Asia <NA>
## 556871 SRP245400 Asian <NA> <NA> Asia <NA>
## 556872 SRP245400 Asian <NA> <NA> Asia <NA>
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP251118
## SRA.Study ETHNICITY RACE finalRace finalGeography hispanic
## 782356 SRP251118 EUR <NA> <NA> Europe <NA>
## 782358 SRP251118 EUR <NA> <NA> Europe <NA>
## 782359 SRP251118 EUR <NA> <NA> Europe <NA>
## 782360 SRP251118 EUR <NA> <NA> Europe <NA>
## 782367 SRP251118 EUR <NA> <NA> Europe <NA>
## 782369 SRP251118 EUR <NA> <NA> Europe <NA>
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP268711
## SRA.Study ETHNICITY RACE finalRace finalGeography hispanic
## 666767 SRP268711 <NA> Caucasian White <NA> <NA>
## 666772 SRP268711 <NA> Caucasian White <NA> <NA>
## 666774 SRP268711 <NA> Caucasian White <NA> <NA>
## 666786 SRP268711 <NA> Black/African American Black or African American <NA> <NA>
## 666793 SRP268711 <NA> Black/African American Black or African American <NA> <NA>
## 666795 SRP268711 <NA> Black/African American Black or African American <NA> <NA>
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP274641
## SRA.Study ETHNICITY RACE finalRace finalGeography hispanic
## 860020 SRP274641 African <NA> <NA> Subsaharan Africa <NA>
## 860024 SRP274641 African <NA> <NA> Subsaharan Africa <NA>
## 860014 SRP274641 Asian-Pacificlslander <NA> Multiple <NA> <NA>
## 860016 SRP274641 Caucasian <NA> White <NA> <NA>
## 860032 SRP274641 Asian-Pacificlslander <NA> Multiple <NA> <NA>
## 860036 SRP274641 Hispanic <NA> <NA> <NA> hispanic
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP283115
## SRA.Study ETHNICITY RACE finalRace finalGeography hispanic
## 859178 SRP283115 Asian <NA> <NA> Asia <NA>
## 859179 SRP283115 Asian <NA> <NA> Asia <NA>
## 859173 SRP283115 Caucasian <NA> White <NA> <NA>
## 859176 SRP283115 Hispanic/Latino <NA> <NA> <NA> hispanic
## 859181 SRP283115 Caucasian <NA> White <NA> <NA>
## 859182 SRP283115 Black/African American <NA> Black or African American <NA> <NA>
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP300191
## SRA.Study ETHNICITY RACE finalRace finalGeography hispanic
## 802400 SRP300191 <NA> African American Black or African American <NA> <NA>
## 802401 SRP300191 <NA> Caucasian American White <NA> <NA>
## 802402 SRP300191 <NA> Caucasian American White <NA> <NA>
## 802403 SRP300191 <NA> Caucasian American White <NA> <NA>
## 802404 SRP300191 <NA> African American Black or African American <NA> <NA>
## 802405 SRP300191 <NA> African American Black or African American <NA> <NA>
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP303641
## SRA.Study ETHNICITY RACE finalRace finalGeography hispanic
## 845507 SRP303641 Asian <NA> <NA> Asia <NA>
## 845508 SRP303641 Asian <NA> <NA> Asia <NA>
## 845523 SRP303641 North african <NA> <NA> North Africa and Western Asia <NA>
## 845524 SRP303641 North african <NA> <NA> North Africa and Western Asia <NA>
## 845533 SRP303641 Asian <NA> <NA> Asia <NA>
## 845534 SRP303641 Asian <NA> <NA> Asia <NA>
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP303646
## SRA.Study ETHNICITY RACE finalRace finalGeography hispanic
## 763903 SRP303646 <NA> African American Black or African American <NA> <NA>
## 763904 SRP303646 <NA> White White <NA> <NA>
## 763905 SRP303646 <NA> African American Black or African American <NA> <NA>
## 763906 SRP303646 <NA> African American Black or African American <NA> <NA>
## 763907 SRP303646 <NA> African American Black or African American <NA> <NA>
## 763908 SRP303646 <NA> African American Black or African American <NA> <NA>
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP324614
## SRA.Study ETHNICITY RACE finalRace finalGeography hispanic
## 1098430 SRP324614 White <NA> <NA> Europe <NA>
## 1098431 SRP324614 White <NA> <NA> Europe <NA>
## 1098432 SRP324614 White <NA> <NA> Europe <NA>
## 1098433 SRP324614 White <NA> <NA> Europe <NA>
## 1098428 SRP324614 Black or African American <NA> Black or African American <NA> <NA>
## 1098429 SRP324614 Black or African American <NA> Black or African American <NA> <NA>
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP363798
## SRA.Study ETHNICITY RACE finalRace finalGeography hispanic
## 1047027 SRP363798 Asian <NA> <NA> Asia <NA>
## 1047028 SRP363798 Asian <NA> <NA> Asia <NA>
## 1047031 SRP363798 African <NA> <NA> Subsaharan Africa <NA>
## 1047032 SRP363798 African <NA> <NA> Subsaharan Africa <NA>
## 1047033 SRP363798 Asian <NA> <NA> Asia <NA>
## 1047034 SRP363798 Asian <NA> <NA> Asia <NA>
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP377781
## SRA.Study ETHNICITY RACE finalRace finalGeography hispanic
## 1164886 SRP377781 asian <NA> <NA> Asia <NA>
## 1164891 SRP377781 mixed <NA> <NA> Multiple <NA>
## 1164900 SRP377781 asian <NA> <NA> Asia <NA>
## 1164860 SRP377781 caucasian <NA> White <NA> <NA>
## 1164861 SRP377781 caucasian <NA> White <NA> <NA>
## 1164862 SRP377781 caucasian <NA> White <NA> <NA>
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP388678
## SRA.Study ETHNICITY RACE finalRace finalGeography hispanic
## 1101373 SRP388678 Asian <NA> <NA> Asia <NA>
## 1101376 SRP388678 Mixed Ethnicity <NA> <NA> Multiple <NA>
## 1101379 SRP388678 Asian <NA> <NA> Asia <NA>
## 1101381 SRP388678 Asian <NA> <NA> Asia <NA>
## 1101384 SRP388678 Middle Eastern <NA> <NA> North Africa and Western Asia <NA>
## 1101387 SRP388678 Middle Eastern <NA> <NA> North Africa and Western Asia <NA>
by(allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies,], allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies,]$SRA.Study, function(x) table(x$RACE))
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: DRP001797
## < table of extent 0 >
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: ERP116722
## < table of extent 0 >
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: ERP117085
## < table of extent 0 >
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP063355
## < table of extent 0 >
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP070663
## < table of extent 0 >
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP072417
##
## African American Asian Hispanic South Asian White White.Hispanic
## 20 36 14 19 250 2
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP073813
## < table of extent 0 >
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP074739
## < table of extent 0 >
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP102952
##
## Bruneian Caucasian Chinese Indian Malay
## 2 2 106 24 32
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP114762
## < table of extent 0 >
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP116913
## < table of extent 0 >
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP118614
## < table of extent 0 >
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP125882
## < table of extent 0 >
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP155483
## < table of extent 0 >
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP172694
## < table of extent 0 >
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP188296
##
## African American African/American Arabic Asian Caucasian Native American
## 1 1 1 15 12 3
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP190479
## < table of extent 0 >
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP212343
## < table of extent 0 >
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP212369
## < table of extent 0 >
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP212370
## < table of extent 0 >
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP216558
## < table of extent 0 >
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP219483
##
## African American Asian Indian black caucasian
## 1 1 1 7
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP221484
## < table of extent 0 >
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP226691
##
## Caucasian Japanese
## 1 12
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP245400
## < table of extent 0 >
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP251118
## < table of extent 0 >
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP268711
##
## Asian (Nepali) Black/African American Caucasian Hispanic/Latino
## 18 35 52 15
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP274641
## < table of extent 0 >
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP283115
## < table of extent 0 >
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP300191
##
## African American Asian American Caucasian American Eastern African
## 15 1 15 1
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP303641
## < table of extent 0 >
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP303646
##
## African American Asian Casucasian caucaisian Hispanic Indian White
## 23 1 1 1 3 1 7
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP324614
## < table of extent 0 >
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP363798
## < table of extent 0 >
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP377781
## < table of extent 0 >
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP388678
## < table of extent 0 >
by(allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies,], allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies,]$SRA.Study, function(x) table(x$ETHNICITY))
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: DRP001797
##
## African american Arabic Caucasian Hispanic Japanese
## 8 3 19 2 10
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: ERP116722
##
## Fulla Jola Mandingo Manjago Other Wollof
## 14 7 15 2 2 4
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: ERP117085
##
## black mixed ancestry
## 28 153
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP063355
##
## asian black white
## 2 5 11
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP070663
##
## African-American Asian Hispanic-Latino
## 3 6 3
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP072417
## < table of extent 0 >
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP073813
##
## African American Asian Caucasian Other Pacific Islander
## 6 6 260 6 3
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP074739
##
## Africa American Caucasian South Indian
## 5 67 5
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP102952
## < table of extent 0 >
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP114762
##
## Black White
## 2 16
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP116913
##
## African heritage / african american American indian or alaskan native Asian - central/south asian heritage Other
## 264 14 14 41
## White - caucasian / european heritage
## 305
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP118614
##
## African Americans European Americans
## 16 16
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP125882
##
## Black Caucasian Latin Oriental
## 13 25 1 1
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP155483
## < table of extent 0 >
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP172694
##
## African Asian-Pacificlslander Caucasian Hispanic
## 2 2 1 1
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP188296
## < table of extent 0 >
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP190479
##
## Asian Caucasian Middle Eastern Pacific Islander
## 1 56 1 2
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP212343
##
## Asian Caucasian Middle Eastern Pacific Islander
## 1 54 1 2
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP212369
##
## Asian Caucasian Middle Eastern Pacific Islander
## 1 56 1 2
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP212370
##
## Asian Caucasian Middle Eastern Pacific Islander
## 1 56 1 2
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP216558
##
## Asian Caucasian Indian
## 1 1 12
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP219483
## < table of extent 0 >
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP221484
## < table of extent 0 >
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP226691
## < table of extent 0 >
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP245400
##
## African American Asian Caucasian Hispanic Native American
## 94 38 188 23 2
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP251118
##
## AA EUR
## 34 37
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP268711
## < table of extent 0 >
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP274641
##
## African Asian-Pacificlslander Caucasian Hispanic
## 2 2 1 1
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP283115
##
## Asian Asian/Pacific Black/African Black/African American Caucasian Hispanic/Latino
## 2 2 1 3 6 3
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP300191
## < table of extent 0 >
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP303641
##
## Asian Caucasian Hispanic North african
## 4 92 2 2
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP303646
## < table of extent 0 >
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP324614
##
## Black or African American White
## 3 4
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP363798
##
## African Asian Asian or Asian British - Indian Bangladeshi Caribbean
## 2 30 2 2 2
## Caucasian Chinese Jamaican
## 50 2 2
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP377781
##
## asian caucasian mixed
## 2 38 1
## ---------------------------------------------------------------------------------------------------------------------------------------
## allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies, ]$SRA.Study: SRP388678
##
## African American/Black Ashkenazi Jewish Asian Caucasian Hispanic Middle Eastern Mixed Ethnicity
## 42 12 66 305 30 9 26
## Other
## 2
On the basis of that, studies with info originally in the race column: 1. SRP072417: Racial but distinguishes between South Asian and Asian (should just collapse.) 1. SRP102952: Caucasian when the rest is clearly from ISEA, should be geography. 1. SRP188296: Single Arabic individual 1. SRP219483: Single Asian Indian indivual, but distinguishes between African American and Black (sigh) 1. SRP226691: Single Caucasian and a lot of Japanese? 1. SRP268711: Asian (Nepali) should be race:Asian? 1. SRP300191: Eastern African distinct from African American 1. SRP303646: single Indian individual (and two ways of mispelling Caucasian)
And now the flip side (bolded are updated below): 1. DRP001797: Mixture of terms - Arabic, Japanese, and census terms. 1. ERP116722: Distinct African groups and ‘other’ which is getting parsed incorrectly. Should be all geography. 1. ERP117085: Black vs mixed geography; comes from UCL so would move 100% to ethnicity 1. SRP063355: Should be all race. 1. SRP070663: Should be all race. 1. SRP073813: Should be all race. 1. SRP074739: 5 South Indian inds, not clear. 1. SRP114762: Black and white… should keep it consistent not matter what. 1. SRP116913: Should be all race. 1. SRP118614: IDEK 1. SRP125882: IDEK 1. SRP172694, SRP274641: Should be race minus the hispanic sample - AsianPacific Islander is messing things up, should be Pacific islander. This one is annoying to fix, so doing it later 1. SRP190479, SRP212343, SRP212369, SRP212370: Middle Eastern keeping it from being race. 1. SRP216558: IDEK 1. SRP245400: Should be all race minus the hispanic. 1. SRP251118: Probably race. 1. SRP283115: AsianPacific should probably be Pacific Islander. This one is annoying to fix, so doing it later 1. SRP303641: North African otherwise race/hispanic. 1. SRP324614: Should be race. 1. SRP363798: Caucasian should go to Europe? 1. SRP377781: IDEK 1. SRP388678: IDEK
Some of the easier fixes, implemented here (should probably do it before, but let’s wait for a reply to my email first). Annoying to do it manually, but here we are.
# I don't trust the ordering to be maintained...
allSRAFinal[allSRAFinal$SRA.Study %in% c("ERP116722", "SRP102952"), ]$finalGeography <- coalesce(allSRAFinal[allSRAFinal$SRA.Study %in% c("ERP116722", "SRP102952"), ]$finalGeography, allSRAFinal[allSRAFinal$SRA.Study %in% c("ERP116722", "SRP102952"), ]$finalRace)
allSRAFinal[allSRAFinal$SRA.Study %in% c("SRP063355", "SRP070663", "SRP073813", "SRP116913", "SRP245400", "SRP251118", "SRP324614", "SRP268711", "SRP072417"), ]$finalRace <- coalesce(allSRAFinal[allSRAFinal$SRA.Study %in% c("SRP063355", "SRP070663", "SRP073813", "SRP116913", "SRP245400", "SRP251118", "SRP324614", "SRP268711", "SRP072417"), ]$finalRace, allSRAFinal[allSRAFinal$SRA.Study %in% c("SRP063355", "SRP070663", "SRP073813", "SRP116913", "SRP245400", "SRP251118", "SRP324614", "SRP268711", "SRP072417"), ]$finalGeography)
# Seems like ordering is maintained, so...
table(allSRAFinal$finalGeography)
##
## Americas Asia East Asia Europe Multiple North Africa and Western Asia
## 164 304 1333 3281 206 27
## Other South Asia Southeast Asia Subsaharan Africa White
## 3 773 83 524 2
table(allSRAFinal$finalRace)
##
## American Indian or Alaskan Native Asia Asian Black or African American
## 83 52 658 2993
## Europe Multiple Native Hawaiian or other Pacific Islander Other
## 346 244 19 278
## South Asia White
## 51 12538
# and now we need to update some the terms that have changed:
allSRAFinal$finalGeography <- gsub("White", "Europe", allSRAFinal$finalGeography)
allSRAFinal$finalRace <- gsub("South Asia", "Asian", allSRAFinal$finalRace) %>% gsub("Europe", "White", .) %>% gsub("Asia$", "Asian", .) %>% gsub("Subsaharan Africa", "Black or African American", .)
# And set the other descriptor to NA:
allSRAFinal[allSRAFinal$SRA.Study %in% c("ERP116722", "SRP102952"), ]$finalRace <- NA
allSRAFinal[allSRAFinal$SRA.Study %in% c("SRP063355", "SRP070663", "SRP073813", "SRP116913", "SRP245400", "SRP251118", "SRP324614", "SRP268711", "SRP072417"), ]$finalGeography <- NA
# And now let's make the plot again to see if things have improved:
tabulatePops2 <- by(allSRAFinal, allSRAFinal$SRA.Study, function(x) table(x$finalGeography, x$finalRace))
conflictStudies2 <- names(tabulatePops2[lapply(tabulatePops2, function(x) grep("NULL", dimnames(x))) %>% grepl(0, .)])
length(conflictStudies2)
## [1] 25
tabulateTerms2 <- by(allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies2,], allSRAFinal[allSRAFinal$SRA.Study %in% conflictStudies2,]$SRA.Study, function(x) table(is.na(x$finalGeography), is.na(x$finalRace))) # So there's often a clear skew towards one category, which should make solving these easier...
tabulateTerms2 <- data.frame(melt(unlist(tabulateTerms2)))
tabulateTerms2$condition <- rep(c("both", "race", "geography", "neither"), nrow(tabulateTerms2)/4)
tabulateTerms2$SRA.Study <- str_sub(rownames(tabulateTerms2), end=-2)
ggplot(tabulateTerms2, aes(x = SRA.Study, y = value, fill=condition)) +
geom_bar(stat="identity") +
theme_bw() +
ggtitle("Race or Ethnicity usage") +
# xlab("finalSystem") +
ylab("Count") +
scale_fill_brewer(palette = "Set1") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
theme(legend.position="bottom")
Basic plots of population descriptors, now that we’re happy with that.
sampleGeography <- allSRAFinal %>% count(SRA.Study, finalGeography) %>% drop_na(finalGeography)
sampleRace <- allSRAFinal %>% count(SRA.Study, finalRace) %>% drop_na(finalRace)
meltSampleGeography <- melt(sampleGeography)
## Using SRA.Study, finalGeography as id variables
meltSampleRace <- melt(sampleRace)
## Using SRA.Study, finalRace as id variables
table(allSRAFinal$finalGeography)
##
## Americas Asia East Asia Europe Multiple North Africa and Western Asia
## 164 252 1333 2937 206 27
## Other South Asia Southeast Asia Subsaharan Africa
## 3 722 83 524
table(allSRAFinal$finalRace)
##
## American Indian or Alaskan Native Asian Black or African American Multiple
## 83 761 2993 244
## Native Hawaiian or other Pacific Islander Other White
## 19 276 12882
sum(table(allSRAFinal$finalGeography))
## [1] 6251
sum(table(allSRAFinal$finalRace))
## [1] 17258
ggplot(meltSampleGeography, aes(x = finalGeography, y=value,fill=finalGeography)) +
geom_boxplot(width=0.5, alpha=0.4) +
geom_jitter(size=1, width=0.2) +
theme_bw() +
ggtitle("Geography across all studies") +
# xlab("System") +
ylab("Count") +
scale_y_continuous(trans='log10') +
scale_fill_viridis(discrete=T, na.value="grey50", option="plasma") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
theme(legend.position = "none")
ggplot(meltSampleGeography, aes(x = finalGeography, y=value,fill=finalGeography)) +
geom_bar(stat="identity", alpha=0.6) +
theme_bw() +
ggtitle("Geography across all studies") +
# xlab("System") +
ylab("Count") +
# scale_y_continuous(trans='log10') +
scale_fill_viridis(discrete=T, na.value="grey50", option="plasma") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
theme(legend.position = "none")
ggplot(meltSampleRace, aes(x = finalRace, y=value,fill=finalRace)) +
geom_boxplot(width=0.5, alpha=0.4) +
geom_jitter(size=1, width=0.2) +
theme_bw() +
ggtitle("Race across all studies") +
# xlab("System") +
ylab("Count") +
scale_y_continuous(trans='log10') +
scale_fill_viridis(discrete=T, na.value="grey50") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
theme(legend.position = "none")
ggplot(meltSampleRace, aes(x = finalRace, y=value,fill=finalRace)) +
geom_bar(stat="identity", alpha=0.6) +
theme_bw() +
ggtitle("Race across all studies") +
# xlab("System") +
ylab("Count") +
# scale_y_continuous(trans='log10') +
scale_fill_viridis(discrete=T, na.value="grey50") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
theme(legend.position = "none")
Yeah that looks good enough. At this point it might be interesting to see where the studies are coming from… but that is harder to parse than you might expect, so we’ll save that for later.
First, some summary statistics and plots:
# First we focus on population descriptors:
bigGeographySummary <- allSRAFinal %>% count(SRA.Study, finalGeography, finalSystem, finalOrgan)
bigRaceSummary <- allSRAFinal %>% count(SRA.Study, finalRace, finalSystem, finalOrgan)
GeographySummary <- allSRAFinal %>% count(finalGeography, finalSystem, finalOrgan)
raceSummary <- allSRAFinal %>% count(finalRace, finalSystem, finalOrgan)
meltGeography <- melt(GeographySummary)
## Using finalGeography, finalSystem, finalOrgan as id variables
meltRace <- melt(raceSummary)
## Using finalRace, finalSystem, finalOrgan as id variables
meltGeography %>% drop_na(c(finalSystem, finalGeography)) %>%
ggplot(., aes(x = finalSystem, y=value, fill=finalGeography)) +
geom_bar(stat="identity") +
ggtitle("Biological System by Geography") +
# xlab("finalSystem") +
ylab("Count") +
theme_bw() +
scale_fill_viridis(discrete=T, na.value="grey50", option="plasma") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
theme(legend.position="bottom")
meltGeography %>% drop_na(c(finalOrgan, finalGeography)) %>%
ggplot(., aes(x = finalOrgan, y= value, fill=finalGeography)) +
geom_bar(stat="identity") +
theme_bw() +
ggtitle("Organ by Geography") +
# xlab("finalOrgan") +
ylab("Count") +
scale_fill_viridis(discrete=T, na.value="grey50", option="plasma") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
theme(legend.position="bottom")
meltGeography %>% drop_na(c(finalOrgan, finalSystem, finalGeography)) %>%
ggplot(., aes(x = finalOrgan, y= value, color=finalGeography, fill=finalGeography)) +
geom_jitter(size = 4, width=0.2) +
theme_bw() +
ggtitle("Organ by geography") +
# xlab("Organ") +
ylab("Count") +
scale_y_continuous(trans='log10') +
scale_color_viridis(discrete=T, na.value="grey50", option="plasma") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
theme(legend.position = "bottom")
meltGeography %>% drop_na(c(finalSystem, finalGeography)) %>%
ggplot(., aes(x = finalSystem, y= value, color=finalGeography, fill=finalGeography)) +
geom_jitter(size = 4, width=0.2) +
theme_bw() +
ggtitle("System by geography") +
# xlab("System") +
ylab("Count") +
scale_y_continuous(trans='log10') +
scale_color_viridis(discrete=T, na.value="grey50", option="plasma") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
theme(legend.position = "bottom")
meltRace %>% drop_na(c(finalSystem, finalRace)) %>%
ggplot(., aes(x = finalSystem, y= value, fill=finalRace)) +
geom_bar(stat="identity") +
theme_bw() +
ggtitle("Biological System by Race") +
# xlab("finalSystem") +
ylab("Count") +
scale_fill_viridis(discrete=T, na.value="grey50") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
theme(legend.position="bottom")
meltRace %>% drop_na(c(finalOrgan, finalRace)) %>%
ggplot(., aes(x = finalOrgan, y= value, fill=finalRace)) +
geom_bar(stat="identity") +
theme_bw() +
ggtitle("Organ by Race") +
# xlab("finalSystem") +
ylab("Count") +
scale_fill_viridis(discrete=T, na.value="grey50") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
theme(legend.position="bottom")
meltRace %>% drop_na(c(finalOrgan, finalSystem, finalRace)) %>%
ggplot(., aes(x = finalOrgan, y= value, color=finalRace, fill=finalRace)) +
geom_jitter(size = 4, width=0.2) +
theme_bw() +
ggtitle("Organ by race") +
# xlab("Organ") +
ylab("Count") +
scale_y_continuous(trans='log10') +
scale_color_viridis(discrete=T, na.value="grey50") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
theme(legend.position = "bottom")
meltRace %>% drop_na(c(finalSystem, finalRace)) %>%
ggplot(., aes(x = finalSystem, y= value, color=finalRace, fill=finalRace)) +
geom_jitter(size = 4, width=0.2) +
theme_bw() +
ggtitle("System by race") +
# xlab("System") +
ylab("Count") +
scale_y_continuous(trans='log10') +
scale_color_viridis(discrete=T, na.value="grey50") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
theme(legend.position = "bottom")
We’re also interested in how diverse a given study is, and how many studies include samples from each descriptor. We can easily calculate all of that too, although it is hard to see visually.
geographyStudy <- allSRAFinal %>% count(SRA.Study, finalGeography) %>% drop_na(finalGeography)
# How many studies with any sort of Geography info?
dim(geographyStudy)
## [1] 136 3
length(unique(geographyStudy$SRA.Study))
## [1] 94
# And some quick stats on diversity by study:
geographyStudy %>% count(SRA.Study) %>% summary()
## SRA.Study n
## Length:94 Min. :1.000
## Class :character 1st Qu.:1.000
## Mode :character Median :1.000
## Mean :1.447
## 3rd Qu.:1.750
## Max. :5.000
# But... how many samples?
ggplot(geographyStudy, aes(x = finalGeography, y=n, fill=SRA.Study)) +
geom_bar(stat="identity") +
ggtitle("Geography by Study") +
ylab("Count") +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
theme(legend.position = "none")
# This is hard to see, so:
ddply(geographyStudy, "finalGeography", summarise, totalN = sum(n), meanN=mean(n), maxN=max(n), nStudies=length(unique((SRA.Study))))
## finalGeography totalN meanN maxN nStudies
## 1 Americas 164 20.50000 39 8
## 2 Asia 252 13.26316 66 19
## 3 East Asia 1333 40.39394 208 33
## 4 Europe 2937 122.37500 753 24
## 5 Multiple 206 41.20000 153 5
## 6 North Africa and Western Asia 27 2.70000 9 10
## 7 Other 3 1.50000 2 2
## 8 South Asia 722 48.13333 365 15
## 9 Southeast Asia 83 27.66667 48 3
## 10 Subsaharan Africa 524 30.82353 129 17
# Now adding the finalOrgan dimension
geographyStudyfinalOrgan <- allSRAFinal %>% count(SRA.Study, finalGeography, finalOrgan) %>% drop_na(finalGeography, finalOrgan)
ggplot(geographyStudyfinalOrgan, aes(x = finalGeography, y=n, fill=SRA.Study)) +
geom_bar(stat="identity") +
facet_wrap( ~ finalOrgan, nrow=3) +
ggtitle("Geography by study by organ") +
ylab("Count") +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
theme(legend.position = "none")
ggplot(geographyStudyfinalOrgan, aes(x = finalOrgan, y=n, fill=SRA.Study)) +
geom_bar(stat="identity") +
facet_wrap( ~ finalGeography, nrow=3) +
ggtitle("Geography by study by organ") +
ylab("Count") +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
theme(legend.position = "none")
# And now really broken down:
ddply(geographyStudyfinalOrgan, c("finalGeography", "finalOrgan"), summarise, totalN = sum(n), meanN=mean(n), maxN=max(n), nStudies=length(unique((SRA.Study))))
## finalGeography finalOrgan totalN meanN maxN nStudies
## 1 Americas blood 37 18.500000 36 2
## 2 Americas bone marrow 27 27.000000 27 1
## 3 Americas brain 15 15.000000 15 1
## 4 Americas iPSC 42 21.000000 24 2
## 5 Americas joint 4 4.000000 4 1
## 6 Americas muscle 39 39.000000 39 1
## 7 Asia blood 69 23.000000 66 3
## 8 Asia brain 4 1.000000 1 4
## 9 Asia heart 4 4.000000 4 1
## 10 Asia intestine 5 5.000000 5 1
## 11 Asia iPSC 72 24.000000 39 3
## 12 Asia joint 30 30.000000 30 1
## 13 Asia liver 48 16.000000 34 3
## 14 Asia lung 4 4.000000 4 1
## 15 Asia testis 1 1.000000 1 1
## 16 Asia thyroid 15 15.000000 15 1
## 17 East Asia bladder 10 10.000000 10 1
## 18 East Asia blastoderm 22 22.000000 22 1
## 19 East Asia blood 497 35.500000 182 14
## 20 East Asia blood vessel 30 15.000000 20 2
## 21 East Asia bone 4 4.000000 4 1
## 22 East Asia bone marrow 63 21.000000 23 3
## 23 East Asia cancer 15 15.000000 15 1
## 24 East Asia heart 106 106.000000 106 1
## 25 East Asia intestine 361 120.333333 208 3
## 26 East Asia iPSC 12 12.000000 12 1
## 27 East Asia joint 2 2.000000 2 1
## 28 East Asia kidney 20 20.000000 20 1
## 29 East Asia liver 50 25.000000 32 2
## 30 East Asia morula 41 41.000000 41 1
## 31 East Asia muscle 40 40.000000 40 1
## 32 East Asia skin 60 60.000000 60 1
## 33 Europe blood 969 107.666667 375 9
## 34 Europe breast 23 23.000000 23 1
## 35 Europe heart 72 24.000000 46 3
## 36 Europe intestine 258 129.000000 210 2
## 37 Europe iPSC 884 176.800000 330 5
## 38 Europe muscle 687 343.500000 671 2
## 39 Europe prostate 16 16.000000 16 1
## 40 Europe skin 24 24.000000 24 1
## 41 Multiple blood 180 60.000000 153 3
## 42 Multiple intestine 26 13.000000 25 2
## 43 North Africa and Western Asia blood 17 5.666667 9 3
## 44 North Africa and Western Asia blood vessel 4 2.000000 3 2
## 45 North Africa and Western Asia brain 4 1.000000 1 4
## 46 North Africa and Western Asia lung 2 2.000000 2 1
## 47 Other blood 2 2.000000 2 1
## 48 Other heart 1 1.000000 1 1
## 49 South Asia blood 672 84.000000 365 8
## 50 South Asia heart 26 13.000000 24 2
## 51 South Asia intestine 5 5.000000 5 1
## 52 South Asia iPSC 2 2.000000 2 1
## 53 South Asia joint 4 4.000000 4 1
## 54 South Asia skin 1 1.000000 1 1
## 55 South Asia testis 12 12.000000 12 1
## 56 Southeast Asia blood 49 24.500000 48 2
## 57 Southeast Asia heart 34 34.000000 34 1
## 58 Subsaharan Africa blood 507 42.250000 129 12
## 59 Subsaharan Africa heart 1 1.000000 1 1
## 60 Subsaharan Africa intestine 10 10.000000 10 1
## 61 Subsaharan Africa joint 2 2.000000 2 1
## 62 Subsaharan Africa lung 4 2.000000 2 2
These make sense! The Singaporean cohorts will be the three main Geography: (South) Chinese, Malay and Tamil; Ambry is of course invested in diversity, TB is unlikely to show up in Europe. Also I am willing to bet any amount of money that the digestive Asian sequencing comes from cancer samples too somehow?
Anyhow, now we do the same for race:
raceStudy <- allSRAFinal %>% count(SRA.Study, finalRace) %>% drop_na(finalRace)
# How many studies with any sort of Race info?
dim(raceStudy)
## [1] 368 3
length(unique(raceStudy$SRA.Study))
## [1] 190
# And some quick stats on diversity by study:
raceStudy %>% count(SRA.Study) %>% summary()
## SRA.Study n
## Length:190 Min. :1.000
## Class :character 1st Qu.:1.000
## Mode :character Median :2.000
## Mean :1.937
## 3rd Qu.:2.000
## Max. :7.000
# But... how many samples?
ggplot(raceStudy, aes(x = finalRace, y=n, fill=SRA.Study)) +
geom_bar(stat="identity") +
ggtitle("Race by Study") +
ylab("Count") +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
theme(legend.position = "none")
# This is hard to see, so:
ddply(raceStudy, "finalRace", summarise, totalN = sum(n), meanN=mean(n), maxN=max(n), nStudies=length(unique((SRA.Study))))
## finalRace totalN meanN maxN nStudies
## 1 American Indian or Alaskan Native 83 6.384615 40 13
## 2 Asian 761 17.697674 112 43
## 3 Black or African American 2993 29.930000 355 100
## 4 Multiple 244 13.555556 63 18
## 5 Native Hawaiian or other Pacific Islander 19 2.375000 5 8
## 6 Other 276 17.250000 64 16
## 7 White 12882 75.776471 1337 170
# Now adding the finalOrgan dimension
raceStudyfinalOrgan <- allSRAFinal %>% count(SRA.Study, finalRace, finalOrgan) %>% drop_na(finalRace, finalOrgan)
ggplot(raceStudyfinalOrgan, aes(x = finalRace, y=n, fill=SRA.Study)) +
geom_bar(stat="identity") +
facet_wrap( ~ finalOrgan, nrow=3) +
ggtitle("Race by study by organ") +
ylab("Count") +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
theme(legend.position = "none")
ggplot(raceStudyfinalOrgan, aes(x = finalOrgan, y=n, fill=SRA.Study)) +
geom_bar(stat="identity") +
facet_wrap( ~ finalRace, nrow=3) +
ggtitle("Race by study by organ") +
ylab("Count") +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
theme(legend.position = "none")
# And now really broken down:
ddply(raceStudyfinalOrgan, c("finalRace", "finalOrgan"), summarise, totalN = sum(n), meanN=mean(n), maxN=max(n), nStudies=length(unique((SRA.Study))))
## finalRace finalOrgan totalN meanN maxN nStudies
## 1 American Indian or Alaskan Native blood 70 10.000000 40 7
## 2 American Indian or Alaskan Native blood vessel 2 2.000000 2 1
## 3 American Indian or Alaskan Native breast 2 1.000000 1 2
## 4 American Indian or Alaskan Native cancer 5 2.500000 3 2
## 5 American Indian or Alaskan Native heart 1 1.000000 1 1
## 6 American Indian or Alaskan Native nose 3 3.000000 3 1
## 7 Asian blood 544 22.666667 112 24
## 8 Asian blood vessel 15 15.000000 15 1
## 9 Asian bone marrow 1 1.000000 1 1
## 10 Asian brain 6 6.000000 6 1
## 11 Asian breast 47 15.666667 36 3
## 12 Asian cancer 13 13.000000 13 1
## 13 Asian heart 1 1.000000 1 1
## 14 Asian intestine 9 9.000000 9 1
## 15 Asian iPSC 69 17.250000 55 4
## 16 Asian nose 3 3.000000 3 1
## 17 Asian oral cavity 12 12.000000 12 1
## 18 Asian ovary 20 6.666667 11 3
## 19 Asian PNS 9 9.000000 9 1
## 20 Asian skin 2 2.000000 2 1
## 21 Asian uterus 10 10.000000 10 1
## 22 Black or African American adipose 22 22.000000 22 1
## 23 Black or African American blood 1815 41.250000 355 44
## 24 Black or African American blood vessel 9 4.500000 8 2
## 25 Black or African American bone marrow 12 6.000000 9 2
## 26 Black or African American brain 278 25.272727 87 11
## 27 Black or African American breast 126 25.200000 41 5
## 28 Black or African American cancer 59 14.750000 53 4
## 29 Black or African American CNS 2 2.000000 2 1
## 30 Black or African American heart 177 35.400000 124 5
## 31 Black or African American intestine 173 43.250000 135 4
## 32 Black or African American iPSC 22 11.000000 20 2
## 33 Black or African American kidney 1 1.000000 1 1
## 34 Black or African American liver 17 8.500000 16 2
## 35 Black or African American lung 46 15.333333 36 3
## 36 Black or African American lymph node 20 20.000000 20 1
## 37 Black or African American muscle 13 13.000000 13 1
## 38 Black or African American nose 29 14.500000 26 2
## 39 Black or African American oral cavity 19 9.500000 18 2
## 40 Black or African American ovary 4 2.000000 3 2
## 41 Black or African American pituitary gland 3 3.000000 3 1
## 42 Black or African American placenta 32 32.000000 32 1
## 43 Black or African American PNS 3 3.000000 3 1
## 44 Black or African American prostate 48 24.000000 32 2
## 45 Black or African American skin 27 5.400000 13 5
## 46 Black or African American tonsil 1 1.000000 1 1
## 47 Black or African American urinary tract 4 4.000000 4 1
## 48 Black or African American uterus 21 10.500000 19 2
## 49 Black or African American vagina 8 8.000000 8 1
## 50 Multiple adipose 1 1.000000 1 1
## 51 Multiple bladder 1 1.000000 1 1
## 52 Multiple blood 212 21.200000 63 10
## 53 Multiple cancer 12 4.000000 5 3
## 54 Multiple heart 2 2.000000 2 1
## 55 Multiple intestine 3 1.500000 2 2
## 56 Multiple lung 5 1.666667 2 3
## 57 Multiple muscle 1 1.000000 1 1
## 58 Multiple nose 4 4.000000 4 1
## 59 Multiple spleen 1 1.000000 1 1
## 60 Multiple stomach 1 1.000000 1 1
## 61 Multiple thymus 1 1.000000 1 1
## 62 Native Hawaiian or other Pacific Islander blood 8 2.666667 5 3
## 63 Native Hawaiian or other Pacific Islander brain 11 2.200000 3 5
## 64 Other blood 263 21.916667 64 12
## 65 Other brain 6 6.000000 6 1
## 66 Other cancer 1 1.000000 1 1
## 67 Other nose 3 3.000000 3 1
## 68 Other ovary 3 3.000000 3 1
## 69 White adipose 84 28.000000 54 3
## 70 White adrenal gland 3 1.500000 2 2
## 71 White bladder 1 1.000000 1 1
## 72 White blood 7916 134.169492 1337 59
## 73 White blood vessel 115 16.428571 48 7
## 74 White bone marrow 50 12.500000 22 4
## 75 White brain 1357 61.681818 260 22
## 76 White breast 300 37.500000 148 8
## 77 White cancer 620 124.000000 501 5
## 78 White cartilage 1 1.000000 1 1
## 79 White CNS 13 13.000000 13 1
## 80 White digestive tract 2 2.000000 2 1
## 81 White eye 47 23.500000 31 2
## 82 White heart 349 34.900000 237 10
## 83 White intestine 166 27.666667 67 6
## 84 White iPSC 502 45.636364 252 11
## 85 White joint 50 50.000000 50 1
## 86 White kidney 6 2.000000 3 3
## 87 White larynx 1 1.000000 1 1
## 88 White liver 192 96.000000 191 2
## 89 White lung 241 20.083333 92 12
## 90 White lymph node 2 2.000000 2 1
## 91 White muscle 42 21.000000 40 2
## 92 White nose 72 18.000000 29 4
## 93 White oral cavity 46 23.000000 30 2
## 94 White ovary 194 32.333333 115 6
## 95 White pancreas 2 2.000000 2 1
## 96 White pituitary gland 4 4.000000 4 1
## 97 White prostate 143 47.666667 94 3
## 98 White skin 160 14.545455 48 11
## 99 White spleen 2 2.000000 2 1
## 100 White stomach 3 1.500000 2 2
## 101 White testis 3 1.000000 1 3
## 102 White thyroid 20 10.000000 19 2
## 103 White tonsil 5 5.000000 5 1
## 104 White trachea 12 12.000000 12 1
## 105 White urinary tract 47 47.000000 47 1
## 106 White uterus 19 6.333333 16 3
## 107 White vagina 8 8.000000 8 1
Wowowowow the stark difference. Would be nice to slice this by country of sampling to see if this is really driven by the USA, or if random people are using race terms…